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Abstract 



In an increasing number of applications, it is of interest to recover an approximately low-rank 
data matrix from noisy observations. This paper develops an unbiased risk estimate — holding 
If^ . in a Gaussian model — for any spectral estimator obeying some mild regularity assumptions. In 

particular, we give an unbiased risk estimate formula for singular value thresholding (SVT), a 
popular estimation strategy which applies a soft-thresholding rule to the singular values of the 
^ I ■ noisy observations. Among other things, our formulas offer a principled and automated way 

, of selecting rcgularization parameters in a variety of problems. In particular, we demonstrate 

the utility of the unbiased risk estimation for SVT-based denoising of real clinical cardiac MRI 
series data. We also give new results concerning the differentiability of certain matrix-valued 



functions. 

Keywords. Singular value thresholding, Stein's unbiased risk estimate (SURE), differentiabil- 
ity of eigenvalues and eigenvectors, magnetic resonance cardiac imaging. 

>: 

rn ■ 1 Introduction 



Suppose we have noisy observations Y about an m x n data matrix X'^ of interest. 



(n: y., = xo+T^,,, w,j'^M{oy), ' (1.1) 

1 . J — L, . . . , lb. 



We wish to estimate as accurately as possible. In this paper, we are concerned with situations 
^ ■ where the estimand has some structure, namely, X^ has low rank or is well approximated by a 

low-rank matrix. This assumption is often met in practice since the columns of X^ can be quite 
correlated. For instance, these columns may be individual frames in a video sequence, which are 
typically highly correlated. Another example concerns the acquisition of hyperspectral images in 
which each column of X^ is a 2D image at a given wavelength. In such settings, images at nearby 
wavelengths typically exhibit strong correlations. Hence, the special low-rank regression problem 
(jl.ip occurs in very many applications and is the object of numerous recent studies. 

Recently, promoting low-rank has been identified as a promising tool for denoising series of MR 
images, such as those that arise in functional MRI (fMRI) [33], relaxometry [22], cardiac MRI [36], 
NMR spectroscopy [SJ [25], and diffusion- weighted imaging [151 I24j . among others. In dynamic 
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applications like cine cardiac imaging, where "movies" of the beating heart are created, neighboring 
tissues tend to exhibit similar motion profiles through time due to their physical connectedness. 
The diversity of temporal behaviors in this settings will, by nature, be limited, and the Casorati 
matrix (a matrix whose columns comprise vectorized frames of the image series) formed from this 
data will be low-rank [T7]. For example, in breath-hold cine cardiac imaging, background tissue 
is essentially static and the large submatrix of the Casoratian corresponding to this region is very 
well approximated by a rank-1 matrix. 

1.1 Singular value thresholding 

Whenever the object of interest has (approximately) low rank, it is possible to improve upon the 
naive estimate Xq = 1^ by regularizing the maximum likelihood. A natural approach consists in 
truncating the singular value decomposition of the observed matrix Y, and solve 

SVHTa(I^) = arg min -11^ - + A rank(X), (1.2) 

where A a positive scalar. As is well known, if 

min(m,n) 

y = uj:v*= Y1 (1-3) 

i=l 

is a singular value decomposition for Y, the solution is given by retaining only the part of the 
expansion with singular values exceeding A, 

min(m,n) 

SVHTA(r)= >A)tii<. 

i=l 

In other words, one applies a hard-thresholding rule to the singular values of the observed matrix 
Y. Such an estimator is discontinuous in Y and a popular alternative approach applies, instead, a 
soft-thresholding rule to the singular values: 

min(m,n) 

S\T^{Y)= Yl {cT^-^)+u,v*■, (1.4) 

i=l 

that is, we shrink the singular values towards zero by a constant amount A. Here, the estimate 
SVT;j("K) is Lipschitz continuous. This follows from the fact that the singular value thresholding 
operation (|1.4p is the prox of the nuclear norm || • ||* (the nuclear norm of a matrix is sum of its 
singular values), i.e. is the unique solution to 

min ^\\Y -X\\l + X\\X\U. (1.5) 

The focus of this paper is on this smoother estimator referred to as the singular value thresholding 
(SVT) estimate. 
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1.2 A SURE formula 



The classical question is, of course, how much shrinkage should be applied. Too much shrinkage 
results in a large bias while too little results in a high variance. To find the correct trade-off, it would 
be desirable to have a method that would allow us to compare the quality of estimation for different 
values of the parameter A. Ideally, we would like to select A as to minimize the mean-squared error 
or risk 

MSE(A) =E||X° -SVTA(r)|||. (1.6) 

Unfortunately, this cannot be achieved since the expectation in ()1.6p depends on the true X^, and 
is thus unknown. Luckily, when the observations follow the model (jl.ip . it is possible to construct 
an unbiased estimate of the risk, namely, Stein's Unbiased Risk Estimate (SURE) [32] given by 

min(m,n) 

SURE(SVTa) (1") = -mnr2+ ^ mm{X^,af) + 2T'^dw {SYTx{Y)), (1.7) 

1=1 

where {ai}^^i denote the singular values of Y. Above, 'div' is the divergence of the nonlinear 
mapping SVTa, which is to be interpreted in a weak sense. Roughly speaking, it can fail to exist 
on negligible sets. The main contribution of this paper is to provide a closed-form expression for 
the divergence of this estimator. We prove that in the real-valued case. 



min(m,n) 



min(m,n) 



div(SVTA(l-))= Yl %.>A) + |m-n|(l-A) +2 ^ ^"^7" (1.8) 
i=i L \ ^^J+] ,^^...^1 (^j 

when Y is simple — i.e., has no repeated singular values — and otherwise, say, is a valid expression 
for the weak divergence. Hence, this simple formula can be used in (jLTh . and ultimately leads to 
the determination of a suitable threshold level by minimizing the estimate of the risk, which only 
depends upon the observed data. 

We pause to present some numerical experiments to assess the quality of SURE as an estimate 
of the risk. We work with four matrices Xf, i = 1, . . . , 4, of size 200 x 500. Here, Xf has full rank; 
X^ has rank 100; X^ has rank 10; and X^ has singular values equal to CTj = \/200/(l + e(*"^°°)/2°), 
i = 1,...,200. Finally, each matrix is normalized so that = 1, i = 1,...,4. Next, two 

methods are used to estimate the risk (jl.6p of SVTa seen as a function of A. The first methods uses 



3=1 

where {l^'-"'^}^2=i independent samples drawn from model (jl.ip with X^ = Xj'. We set this to 
be the value of reference. The second uses SURE(SVTa)(1^), where Y is drawn from model (jl.ip 
independently from {Y^^^^}. Finally, in each case we work with values of the signal-to-noise ratio, 
defined here as SNR = \\X^\\p/^/mnT = l/^/mriT, and set SNR = 0.5,1,2 and 4. The results 
are shown in Figure [H where one can see that SURE remains very close to the true value of 
the risk, even though it is calculated from a single observation. Matlab code reproducing the 
figures is available and computing SURE formulas for various spectral estimators is available at 



http : //www-stat . Stanford. edu/~candes/SURE_SVT_code .zip 
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(c) (d) 

Figure 1: Comparison of the risk estimate using Monte Carlo (solid line) and SURE (cross) versus 
A X T for G M200X500 g^R = 0.5; (b) SNR = 1; (c) SNR = 2; and (d) SNR = 4. 

The colors indicate the matrices used in each case. As we can see, SURE follows closely the Monte 
Carlo estimates, even though it requires only one observation to estimate the risk. 
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In MR applications, observations can take on complex values. Indeed, MRI scanners employ 
a process known as quadrature detection (two orthogonal phase-sensitive detectors) to observe 
both the magnitude and phase of rotating magnetization that is induced by radio-frequency (RF) 
excitation |18j . Quadrature detection both increases SNR and allows for the encoding of motion 
information like flow. MRI noise, which is Gaussian, is also complex valued. In general, MRI noise 
can also be assumed iid, noting that inter-channel correlation in parallel MRI data can be removed 
via Cholesky pre-whitening. Thus, model (jl.ip has to be modified as 

Y = X^ + W, Re{Wij), lm{Wij) ~ AA(0,r2), (1.9) 
where the real and imaginary parts are also independent. In this case, SURE becomes 

min{m,n) 

SURE(SVTa)(1^) = -2mnT^ + mm{X^,af) + 2T^dw (SVTa (!")). 

1=1 

We also provide an expression for the weak divergence in this context, namely, 

min(m,n) ^ / \ \ i min(m,n) 



div(SVTA(l^)) = Yl %i > A) + (2|m-n| + 1) (l ) +4 ^ 



ai{ai - A)+ 
erf - ai 



(1-10) 

when Y is simple, and otherwise. This formula can be readily applied to MR data as we shall see 
in Section [2j The decomposition into real and imaginary parts would suggest that the divergence 
would be proportional to twice the divergence in the real case. However, this is not the case. The 
most significant difference is that there is a contribution of the inverse of the singular values even 
when the matrix is square. 



1.3 Extensions 

The formulae (|1.8p - (|1.10p have applications beyond SURE. For instance, the divergence of an 
estimation procedure arises naturally when trying to estimate the degrees of freedom associated 
with it. Consider the estimate SVTa(I^) of when the observation Y comes from an additive 
error model. The degrees of freedom |1H [32] are defined as 

m n 

df(SVTA) = cov {SYT x{Y),j,Yi,) . 

1=1 j=i 

When the observations Y follow model (jl.ip . it is possible to write the degrees of freedom as 

df(SVTA) = E{div(SVTA(l^))}. 

Therefore, the expression we provide is also useful to estimate or calculate the degrees of freedom 
of singular value thresholding. 

Finally, although our work focuses on SVT, our methods extend beyond this particular estima- 
tor. Consider estimators given by spectral functions. These act on the singular values and take the 
form 

min{m,n) 

f{Y)= Y h{'yi)nrvl:=Uf{^)V\ for all ^ G M'"><^ (1.11) 

i=l 
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where Y = IfSV* is any SVD (SVT is in this class). Our methods show that these functions 
admit a SURE formula, given by 



SURE(/)(F) 



mnr^ + \\f{Y) - Vfp + Ir^div {f{Y)) 



and that under mild assumptions there exists a closed-form for their divergence: 



min(m,n) 



) 



min(m,n) 



div(/(X))= Yl (ni^^) + \m-n 



+ 2 E 




(1.12) 



i=l 



This is of interest because such estimators arise naturally in regularized regression problems. For 
instance, let J : M™^" i— )• M be a lower semi-continuous, proper convex function of the form 



is spectral. Hence, (jl.lSp can be used broadly. 

1.4 Connections to other works 

During the preparation of this manuscript, the conference paper [8] came to our attention and 
we would like to point out some differences between this work and ours. In [8], the authors 
propose to recursively estimate the divergence of an estimator given by (jl.l3p . This is done by 
using proximal splitting algorithms. To this end, they provide an expression for the directional 
subdifferential of a matrix-valued spectral function. The divergence is then estimated by averaging 
subdifferentials taken along random directions at each iteration of the proximal algorithm. Our 
approach is obviously different since we provide closed-form formulas for the divergence that are 
simple to manipulate and easy to evaluate. This has clear numerical and conceptual advantages. 
In particular, since we have a closed-form expression for SURE, it becomes easier to understand 
the risk of a family of estimators. Finally, we also address the case of complex-valued data that 
seems out of the scope of [8]. 

1.5 Content 

The structure of the paper is as follows. In Section [2] we present applications in MRI illustrating the 
advantages of choosing the threshold in a disciplined fashion. Section [3] provides precise statements, 
and a rigorous justification for (jl.Sp and (jl.lOp . Section H] deals with the differentiability of spectral 
functions (11. lip , and thus supports Section [3l We conclude with a short discussion of our results, 
and potential research directions in Section \5\ 



min{m,n) 



J{X)= Y J^^<^)) 



Then, for A > the estimator 




(1.13) 
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2 Applications in Magnetic Resonance Imaging (MRI) 



SVT is a computationally-straightforward, yet, powerful denoising strategy for MRI applications 
where spatio-temporal/par ameteric behavior is either a priori unknown or else defined accordingly 
to a complicated nonlinear model that may be numerically challenging to work with. A practi- 
cal challenge in most rank-based estimation problems in MRI (and inverse problems in general), 
however, lies in the selection of the regularization or threshold parameter. Defining an automated, 
disciplined, and consistent methodology for selecting this parameter inherently improves clinical 
workflow both by accelerating the tuning process through the use of optimization-based, rather 
than heuristic, search strategies and by freeing the MRI scanner technician so that they can fo- 
cus on other patient-specific tasks. Moreover, eliminating the human element from the denoising 
process mitigates inter- and intra-operator variability, and thus raises diagnostic confidence in the 
denoising results since images are wholly reproducible. 

2.1 Noise reduction in cardiac MRI series 

Here, we demonstrate a potential use of the SVT unbiased risk estimate for automated and opti- 
mized denoising of dynamic cardiac MRI series. Dynamic cardiac imaging is performed either 
in cine or real-time mode. Cine MRI, the clinical gold-standard for measuring cardiac func- 
tion/volumetrics [3], produces a movie of roughly 20 cardiac phases over a single cardiac cycle 
(heart beat). However, by exploiting the semi-periodic nature of cardiac motion, it is actually 
formed over many heart beats. Cine sampling is gated to a patient's heart beat, and as each data 
measurement is captured it is associated with a particular cardiac phase. This process continues 
until enough data has been collected such that all image frames are complete. Typically, an entire 
2D cine cardiac MRI series is acquired within a single breath hold (less than 30 sees). 

In real-time cardiac MRI, an image series covering many heart beats is generated. Although 
the perceived temporal resolution of this series is coarser than that of a cine series, the "temporal 
footprint" of each frame is actually shorter since it is only comprised of data from one cardiac 
cycle. Real-time cardiac MRI is of increasing clinical interest for studying transient functional 
processes such as first-pass myocardial perfusion, where the hemodynamics of an intravenously- 
injected contrast bolus are visualized. First-pass myocardial perfusion proffers visualization of 
both damage to heart muscle as well as coronary artery blockage. Typically, a real-time cardiac 
MRI study will exceed feasible breath- hold time and respiratory motion may be visible. 

Simultaneously achieving high spatial resolution and SNR is challenging in both cine and real- 
time cardiac MRI due to the dynamic nature of the target signal. Signal averaging (NEX > 1), a 
standard MRI techniques for noise reduction, is infeasible in real-time imaging and undesirable in 
cine imaging due to potential misregistration artifacts, since it cannot be executed within a single 
breath-hold. Real-time acquisitions, which are generally based on gradient recalled echo (GRE) 
protocols, have inherently low SNR due to their use of very short repetition (TR) and echo times 
(TE). Cine acquisitions use either standard GRE or balanced steady-state free precession (bSSFP) 
protocols. When imaging with 1.5 T (Tesla) magnets, bSSFP cine sequences can often yield suffi- 
cient SNR. However, unlike many other MRI protocols, SSFP sequences do not trivially gain SNR 
when moved to higher-field systems (> 3.0 T) [26] which is an emerging clinical trend. As magnetic 
field strength is increased, the RF excitation flip angle used in a bSSFP sequence must be lowered 
to adhere to RF power deposition (SAR) safety limits. This results in weaker signal excitation 
which can mitigate gains in bulk magnetization. Poor receiver coil sensitivity at the heart's medial 
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location and signal loss due to iron overload (hemochromatosis), among other factors, can further 
reduce SNR in both imaging strategies. Beyond confounding visual radiological evaluation, even 
moderate noise levels in a cardiac image can also degrade the performance of automated segmen- 
tation methods that are used for quantitative cardiac function evaluation. Therefore, effective 
denoising techniques that preserve the morphological and dynamic profiles of cardiac image series 
are clinical valuable. 

Consider a series of t separate nxn 2D MR images. To utilize SVT to denoise this data, it must 
first be transformed into a x t Casorati matrix, Y = + W. In many MRI scenarios, spatial 
dimensionality will greatly exceed temporal/parametric dimensionality (n^ ^ t) and Y typically 
is a very thin matrix. Due to limited degrees-of-freedom, even optimally-parameterized SVT may 
result in temporal/parametric blurring. One solution to this problem is to analyze the image series 
in a spatial block- wise manner (see Figure [2]) rather than globally [36] . 




9000 p 
8000 [- 
7000 1 
















JO 00 1 
4000 ■ 
300ol- 


Low Ran 


k 


zooo[- 








ioool 


/ 

















S 10 1 

Index 


S 2 



Low Rank 



Figure 2: Singular values of the Casorati matrices formed from three different 8 x 8 x 19 block 
sets extracted from a cine cardiac sequence, including: a static region (cyan); a dynamic region 
(yellow); and background noise (magenta). 

Let Rb be a binary operator that extracts k'^ rows from a matrix corresponding to a k x k 
spatial block, specified by index 6, within each image. Block-wise SVT can be defined as 

BSVTA(r) = Yl SYTxiRbY), (2.1) 

ben 

where denotes a set of (potentially overlapping) blocks that uniformly tiles the image domain, 
i.e., "^ben ^l^b = cln'^xn'^i c > 0. In words, (j2.ip performs SVT on a family of submatrices of Y 
and accumulates a weighted sum of the results. Of course, for k = n and \Vt\ = 1, (|2.ip resorts to 
standard SVT. The unbiased risk estimator developed in the previous section readily extends for 
this generalized SVT model. By linearity, 

divBSVTA(r) = c-^YdiYRlSYTx{RbY). (2.2) 
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Extending the identity (10) from [2j for matrices then asserts 

ij '"^ ij ^ 

Now, observing that R^Rl = 42x^2, Mb G Vt, we see that divi?^ SVTa(-R61^) = div SVTA(-Rfel"), 
where the latter term is as in (fLSjl for F G X* or (fTTOD for F G Define the mean-squared 

error of BSVTa as 

MSE(A) =E||Xo -BSVTA(r)|||., (2.4) 
and the singular value decomposition RhY = [/feS;,V^*. An unbiased estimator of (|2.4p is 

2 

SURE(BSVTa)(1^) = -^mnr'^+c- 



^RlUtnx{^b)VC 



ben 



+2T^c"^^dwSYTx{RbY), (2.5) 



ben 



where n{T.i,)ij = min(A,Sij) = l{i = i)min(A,cri) and ai = (Sb)^^. For Y G IR"'^*, /3 = 1; 
otherwise, for Y G C"'^*, 13 = 2. 

We now present three MRI examples, one on simulated data and two on real clinical data, and 
demonstrate the utility of the developed unbiased risk estimators for automatic parameterization 
of SVT-based denoising. 

Example 1: PINCAT Numerical Phantom 

Initial evaluation of SURE for SVT was performed on the physiologically-improved NCAT (PIN- 
CAT) numerical phantom |31] . which simulates a first-pass myocardial perfusion real-time MRI 
series. In particularly, the free-breathing model (n = 128, t = 50) available in the kt-SLR software 
package [19] was adapted to include a spatially-smooth and temporally-varying phase field such 
that the target signal was complex valued. Complex iid Gaussian noise (r = 30) was then added 
to the image data. Both standard, or global, and block-wise SVT were each executed at 101 values 
equispaced over [10"\10'^]. Block -wise SVT was performed with k = 7 and Q comprising one block 
for each image pixel, under periodic boundary conditions, such that |0| = n^. 

Figure [3] shows early, middle, and late time frames from the PINCAT series for the noise- 
free (truth), noisy, and SVT-denoising results. The threshold values used to generate the SVT 
results were selected as the MSE/SURE-minimizers in Figure 0^. Also observe in Figure that 
SURE provides a precise estimate of MSE for both the global and block-wise SVT models. The 
high accuracy exhibited in this case can be attributed to the high dimensionality of the MRI 
series data. Note that both global and block-wise SVT yield strong noise reduction generally 
preserve both morphology and contrast. However, block-wise SVT simultaneously demonstrates 
a greater degree of noise removal and fidelity to ground truth. In particular, note the relative 
contrast of the various components of the heart in late frame results. The first observation is 
corroborated by Figure S^, which shows that block-wise SVT is able to achieve a lower MSE than 
global SVT. The second observation is corroborated by Figures Hb-c, which show the worst-case 
absolute error (compared to ground truth) through time for the two SVT setups. Clearly, global 
SVT exhibits higher residual error than block- wise SVT, particularly in areas of high motion near 
the myocardium. The difference between these results can be attributed to the matrix anisotropy 
problem discussed earlier in this section. Thus, SURE can also be used to automatically-determine 
the block-size setting as well as the threshold value. 
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Figure 3: Comparison of MSE-optimized global and block-wise SVT for denoising the complex- 
valued numerical PINCAT phantom. All images are identically windowed and leveled. 



Example 2: Cine Cardiac Imaging 

In the second experiment, a bSSFP long-axis cine sequence (n = 192, t = 19) acquired at 1.5 T 
using an phased-array cardiac receiver coil {1 = 8 channels) was denoised via SVT. The Casorati 
matrices for individual data channels, which have undergone pre-whitening, are stacked to form 
a single Zn^ x t matrix. Assuming that the spatial sensitivity of the receiver channels does not 
vary substantially through time, the rank of this composite matrix will be equal to that for any 
individual channel. For visualization purposes, multi-channel denoising results were compressed 
by calculating the root-sum-of-squares image across the channel dimension. Background noise was 
determined using a manually-drawn region-of-interest (ROI) to have r = 0.67. In this example, 
block- wise SVT was performed with k = 5 and used the same block set as in Example 1. The 
parameter sweep was executed for 101 values equispaced over [10"^, 10^]. In this example, the 
ground truth is unknown, and thus the MSE cannot be computed to be compared against SURE. 
However, inspection of Figure E^i reveals that the same qualitative behavior seen for SURE in the 
numerical phantom example is observed here as well. 

Figure [5] demonstrates the effect of sub-optimally selecting the threshold value for both global 
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(a) (b) (c) 



Figure 4: (a) Plots of MSE and SURE for global and block-wise SVT as a function of threshold 
value, A; the minimizing parameters were used to generate the images in Fig. [3l (b) and (c) show 
the worst-case error through time of the MSE/SURE-optimized global and block- wise SVT results, 
respectively. 

and block-wise SVT. In particular, over- and under-estimates, along with the SURE-optimal values 
are applied within SVT. For both SVT strategies, under-estimation of A fails to remove noise as 
expected. Conversely, over-estimation of A leads to spatio-temporal blurring by both strategies 
albeit in different manners. In global SVT, temporal blurring occurs near areas of high motion like 
the tricuspid valve of the heart (indicated via the red arrow). However, less active areas like the 
pulmonary branching vessels, seen just above the heart, are undistorted. Some background noise 
also remains visible. In the block- wise SVT result, these characteristics are essentially reversed — 
minimal temporal blurring is induced but there is noticeable spatial blurring and loss of low-contrast 
static features. Noise is, however, strongly reduced. A compromise between these extremes is made 
by the SURE-optimized results. As suggested by Figure [6^, block-wise SVT offers stronger noise 
suppression (see Figures [6)3-c) without inducing spatial or temporal blur, the latter which is seen 
even in the optimized global SVT result. This example highlights the sensitivity of SVT-denoising 
performance on parameter selection, and the importance of having a disciplined framework for 
choosing threshold values. Also of note is that, even after empirical pre-whitening, multi-channel 
MRI noise may not be exactly iid Gaussian. Nonetheless, SURE allows production of extremely 
reliable results. 

Example 3: First-Pass Myocardial Perfusion 

The third denoising experiment was performed on the single-channel, first-pass myocardial perfusion 
real-time MRI sequence (n^ = 190, Uy = 90, t = 70) that is also provided with the kt-SLR software 
package [19]. To simulate a low-field acquisition (1.0 T), such as with an open or interventional 
MRI system, complex Gaussian noise was added to the data originally acquired at 3.0 T using a 
GRE sequence. Following this addition, the noise level was estimated at r = 0.105. Since the 
duration of this exam exceeded feasible breath-hold time, there is substantial respiratory motion at 
the end of the series. For this example, only block-wise SVT (fc = 6) denoising was performed, and 
executed at 101 threshold values equispaced over [10^3 xqS]. As in Example 2, Q comprises one 
block for each image pixel, and only SURE can be computed due to the absence of ground truth. 
Mirroring Figure \5\ Figure [7] demonstrates the effect of threshold selection on different image 
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A Too Small A Optimized A Too Large 




Figure 5: Global and block-wise SVT denoising results for a bSSFP long-axis cine cardiac MRI 
series. The effect of selecting too high or low a threshold value, as compared to the MSE/SURE 
optimal value is demonstrated. The red arrow identifies the tricuspid valve of the heart, which 
exhibit a large degree of motion through the sequence. All images are identically windowed and 
leveled. 

frames from the perfusion series. In particular, images showing the transition of contrast from the 
right to left vertical, as well as later-stage onset of myocardial blush are shown. Of the 101 tested 
threshold values, 5 threshold settings corresponding to 2 over-estimation, 2 under-estimation, and 
the SURE-optimal value are depicted (see Figure E]). Under all display settings, temporal contrast 
dynamics are well preserved; however, threshold setting has a marked effect on noise level, spatial 
resolution, and contrast. As before, employing SVT with an under-estimated threshold fails to 
remove noise, as evident in the first two columns of Figure [71 Conversely, employing SVT with an 
over-estimated threshold will remove noise but also induce both spatial blurring and contrast loss. 
This is particularly evident in the 4th image row (t = 20). At high threshold values, the contrast of 
pulmonary vasculature is diminished, and there is visible blurring of the papillary muscles, which 
appear as dark spots in the contrast-enhanced left ventricle. Finally, the SVT result obtained with 
the SURE-optimal threshold represents an ideal balance between these extremes, offering strong 
noise reduction without degradation of important anatomical features. As suggested by Figured! 
there may only exist a narrow parameter window in which effective denoising can be achieved, and 
the presented unbiased risk estimators can greatly aid in the identification of these optimal settings. 
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(a) 



(c) 



Figure 6: (a) SURE for global and block-wise SVT as a function of threshold value, A; the min- 
imizing parameters were used to generate the images in Fig. \E\ (b) and (c) show enlargements 
of the pulmonary branching vessels for the SURE-optimized global and block-wise SVT results in 
Fig. [5l respectively. Red and green arrows highlight improved vessel conspicuity. Both images are 
windowed and leveled identically. 

2.2 Extensions and generalizations 

The three examples in the previous subsection demonstrate the utility of the unbiased risk estima- 
tion for SVT-based denoising of cardiac MRI series data. Of course, this methodology can also be 
applied to any other dynamic or parametric MRI series, including those discussed earlier such as 
functional MRI. Although exhaustive search over a wide range of threshold values was performed 
for the sake of exposition, the observed unimodality of the risk functional suggests that practical 
automated parameter selection for SVT denoising could be accelerated using a bisection strategy 
such as golden section search. 

In addition to optimizing over a range of threshold values, unbiased risk estimation can also 
be used to identify the optimal block-size when using the local SVT model in (12. ip . For example, 
one could imagine performing a sweep of A over a prescribed range of values for a collection of 
block-size values, and computing the lower envelope of this set of risk measures to investigate best 
achievable performance as a function of block-size. In addition to simply optimizing a denoising 
setup, this type of information can be used to guide the development of rank-based reconstruction 
frameworks for undersampled dynamic and/or parallel MRI. 

3 SURE Formulas for Singular Value Thresholding 

In [32], Stein gave a formula for an unbiased estimate of the mean-squared error of an estimator 
obeying a weak differentiability assumption and mild integrability conditions. Before reviewing this, 
recall that a function (7 : i— t- M is said to be weakly differentiable with respect to the variable 
Xi if there exists /i : 1— )• M such that for all compactly supported and infinitely differentiable 
functions if, 
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Figure 7: Block-wise SVT denoising results for GRE short-axis first-pass myocardial perfusion 
sequence. The effect of selecting too high or low a threshold value, as compared to the MSE/SURE 
optimal value, is demonstrated. All images are identically windowed and leveled. 



Roughly speaking, the derivatives can fail to exist over regions of Lebesgue measure zero. 



Proposition 3.1 I14j). Suppose that Y-ij M{Xij, 1). Consider an estimator X of the fi 



orm 



X = Y + g{Y), where gij : 



I— 7- M is weakly differentiable with respect to Yij and 
d 



E\\Yijgij{Y)\ + 



dYi 



< oo, 
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io' 10° 10' 10' 

.1 

Figure 8: SURE for block-wise SVT as a function of threshold value, A; the red dot indicates 
the optimal threshold value used in Fig. [71 and green dots correspond to the employed under- and 
over-estimated values. 

for € X := {1, . . . , m} x {1, . . . n}. Then 

E \\X -X\\l= E{mn + 2div {g{Y)) + \\g{Y)\\l}. (3.1) 

The rest of this section establishes that SVT obeys these assumptions whereas the deduction 
of a closed- form expression for its divergence is deferred to Section [H To achieve this, we record 
a simple fact that will allow us to study both the real- and complex-valued cases within the same 
framework. We identify below R™^"- with M"*". 

Lemma 3.2. Let g : M"*^" ^ M™^'" with components gij : M'"^" ^ R. Assume g{0) = 0. If g is 
Lipschitz with constant L > 0, i.e., 

yX,Y: \\g{X)-g{Y)\\<L\\X-Y\\, 

for some norm in M™^", then each gij is weakly differentiahle Lebesgue-a.e. in M™^"-. Furthermore, 
ifY& jgmxn. distributed as in (|l.ip . then for all pairs {i,j), 

K {\Yij gij {Y)\} < 00 and E 

Proof Since all norms are equivalent in finite dimensions, g is also Lipschitz with respect to the 
Frobenius norm. Hence, 

yX,Y: \\g{X)-g{Y)\\F<L\\X-Y\\F =^ \gij{X) - g,,{Y)\ < L\\X - Y\\f, (3.2) 

for some constant L. By Rademacher's theorem (see Section 3.1.2 in [12]) each component is 
differentiahle Lebesgue-a.e. in R™^". Furthermore, these components are weakly differentiahle 



d 
dY 



■9^JiY) 



< 00. 
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(see Theorem 1 and 2, Section 6.2, in |12j). and both the derivatives and weak derivatives are 
Lebesgue-a.e. equal. 

Regarding the integrabihty, since g{0) = 0, ()3.2p yields ||(7(1^)|||, < and we deduce 

E\\g(Y)fp < L^E\\Yfp = L^{mn + WX^fp) < oo. 
Furthermore, the Cauchy-Schwarz inequality gives 

n\y^m{Y)\} < ny^Y"ng^3{Y?Y'"' < l [{i+xi,^){mn+\\x4i)\''' < oo. 

Finally, (|3.2p asserts that the derivatives of gij — whenever they exist — are bounded by L. Hence, 



E 



which concludes the proof. 



d 



dYi 



< E 



d 



1/2 



< L, 



For pedagogical reasons, we work with singular value thresholding first, before discussing more 
general spectral estimators in Section [H Writing g(Y) = SVTa(I^) — Y then, the conditions 
become 



¥.\\Yi,gi,{Y)\ + 



dY, 



<¥.{\YijS\Tx{Y),j\ + 



d 



dY, 



■SVTA(l^)i 



and we only need to study the weak differentiability and integrabihty of SVT. 



3.1 The real case 

Lemma 3.3. The mapping SVTa, A > 0, obeys the assumptions of Proposition COl 

Proof By definition, SVT^ is the proximity function of the nuclear norm. Since the nuclear norm 
is a convex, proper and lower semi-continuous function, SVT^ is Lipschitz and non-expansive (see 
Chapter 6, Section 30 in [3D]). Applying Lemma [HT^ proves the claim. ■ 

The mapping SVTa may not be differentiable everywhere. Fortunately, since we are interested 
in weak differentiability, we can discard sets of Lebesgue measure zero. For instance, model (jl.ip 
guarantees that Y is simple and has full-rank with probability one. Further, since SVT^ acts by 
applying soft-thresholding to each singular value, it seems natural that matrices with at least one 
singular value equal to A will be a cause of concern. Consider then 



: X is simple, has full rank and no singular value exactly equal to A}. 



It is clear that the complement of J- has Lebesgue measure zero. Consequently, we can restrict 
ourselves to the open set J- to look for a divergence. The result below — a corollary of the general 
Theorem 14.31 — establishes that SVT is differentiable over and thus we have a divergence in the 
usual sense. 

Corollary 3.4. The mapping SVT;s^ is differentiable over T and we have 

min(m,n) ^ / \ \ i min(r?i,n) 



div(SVTA(X))= Yl 



i=l 



I{ai > A} + |m - n| 1 



A 



+2 E M 
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Since SVT^ is of the form (jl.lip with 



f,{a) = ia-XU, //(a) = I{(7 > A}, (3.4) 

for i = 1, . . . ,min(m,n) and <t 7^ A, we see that it is differentiable at the singular value matrix of 
any element of J-. Because the elements of J-' are also simple and full-rank, we conclude SVT;^ is 
differentiable over by Lemma 14.11 Therefore, the proof of (13. 3p is a special case of the general 
Theorem 14.31 from Section HI 



3.2 The complex case 



In the complex case, we need to analyze the conditions for SURE with respect to model (]1.9 
which can be expressed in vector form as 



'Re{Y) 




"Re(Xo)" 




'Re{W) 


lm{Y) 




Im(Xo) 


+ 


lmlw)_ 



(3.5) 



We thus need to study the existence of the weak partial derivatives 



d 



dRe{Yij) 



Re{SVTx)tj and 



d 



-Im(SVT 



XJij, 



and whether they satisfy our integrability conditions, as these are the only ones that play a role in 
the divergence. 

Lemma 3.5. The mappings SVTx, Re(SVT;^) and Im(SVT;v) obey the assumptions of Proposi- 
tion[3l[ 



Proof As in the real case, SVT;^ is the prox-function of the nuclear norm and is thus Lipschitz 
and non-expansive (see Proposition 2.27 in [l]). Hence, the real and imaginary parts are Lipschitz. 
Applying Lemma 13.21 to both the real and imaginary components proves the claim. ■ 



For the divergence, we write 

div(SVTA(X)) = divRe(x) (Re(SVTA)(X)) +divi„(x) (Im(SVTA)(X)). 



(3.6) 



As in the real case, we can discard sets of Lebesgue measure zero. Consider then the analogue of 
J^, whose complement has Lebesgue measure zero. 

Corollary 3.6. The mapping SVT^ is differentiable over T , and we have 

min(m,n) / \ \ n min{m,n) 



div(SVTA(X))= Yl 



i=l 



I{ai > X} + {2\m-n\ + I) { 1 



A 



+4 E 



The argument regarding the differentiability of SVT^ applies here as well. The proof of the 
closed- form expression for the divergence is a special case of the general Theorem 14.51 from Section [4] 
with fi as in ([37 
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4 Differentiability of Spectral Functions: General SURE Formulas 

This section introduces some results concerning the differentiability of spectral functionsJll To 
simplify the exposition, we work with m x n matrices X with m > n, as all our arguments apply 
with minor modifications to X*. Below, j-jgj is the canonical basis of M"^™, Inxn is the 

n-dimensional identity matrix and we set 

J- nxn 



In the real and complex cases, we make use of the reduced and full SVD. The reduced SVD is 
written as X = LfSV*, where U \s mxn with orthonormal columns and V is unitary. In the full 
SVD, X = UTiV* where U is m x m and extends U to a unitary matrix. Finally, we recall from 
Section [1] that a spectral function is of the form f{X) = J7/(S)V* with 

/(S) = diag(/i(ai),...,/„K)), (4.1) 

where we assume that each /j : i— t- is differentiable. We note that we can relax the 
differentiability assumption and only require differentiability in a neighborhood of the spectrum of 
the matrix under study. 

To find a closed- form for the divergence of /, we first need to determine whether / is differ- 
entiable. Using notation from differential calculus (see [lO]), recall that / is differentiable at X if 
there exists a linear mapping dfx, called the differential of / at X, such that for all A, 

\\f{X + A)-f{X)-dfx[A]\\F ^ 
A-)-0 II^IIf 
Differentiation obeys the standard rules of calculus and we have 

d/[A] = di7[A] /(S) V* + Ud{fo 5])[A] V* + U /(S) dV[A]*. (4.2) 

where, to lighten the notation above, we have not indicated the point X at which the differential 
is calculated (we shall use subscripts whenever necessary). By the chain rule, d(/ o 5])x[A] = 
d/s[dSx[A]] so that the differentiability of / depends upon the existence of differentials for U, 
V and at X together with that of / at S. 

Standard analysis arguments immediately establish that the function / is differentiable at S 
whenever S is simple, as in this case there is no ambiguity as to which function /j is being applied to 
the singular values. Furthermore, the differentiability of the SVD of a simple matrix with full-rank 
is a consequence of Theorem 1 and Theorem 2 in [23] applied to X*X and XX* (see also |16j 
and [33] )• We summarize these arguments in the following lemma. 

Lemma 4.1. Let X = U'^V* he a simple and full-rank matrix and f he a spectral function. 

1. The factors of the SVD are differentiahle in a neighhorhood of X . 

2. If f is differentiahle at then it is differentiahle at X. 

We now focus on determining the divergence of / in closed-form. Since there are differences 
between the real- and complex-valued cases, we treat them separately. 



^Here, we set aside any statistical interpretation to focus on differentiability and closed-form expression for the 
divergence of spectral functions. 



18 



4.1 The real case 



Lemma 14.11 asserts that the SVD is differentiable at any simple matrix X with fuh-rank. The 
fohowing result shows that we can determine the differentials of each factor in closed-form. 

Lemma 4.2. Let X be simple and full-rank. Then the differentials ofU and V are given by 

dU[A\ = Un^[A], dF[A] = VClv[A]*, 

where fl^ and fly o'^e given in closed-form (j4.8p (dU is independent of the choice ofU). Finally, 
dSl is given by (j4.7p . 



Proof We follow the same method as in [10], which also appears in [27]. Let X G 
For any matrix A G M™^"", we have 

dX[A] = dJ7[A]SF* + Ud^[A]V* + UmV[A]*. 

Since U and V are orthogonal, it follows that 

tj*dX[A]V = tj*dU[A]T, + U*UdT,[A] + [/*J7EdF[A]*y. 

Introduce the matrices 

n^ylA] = U*dU[A] and fi^IA] = dV[A]*V, 

so that 



(m > n). 
(4.3) 



(4.4) 



WAV = nj^[A]S + /„xnd5][A] + I„xnSnv[A], 
where we used the fact that dX is the identity, whence dX[A] = A. Moreover, 

= dI„xn(A) = V*dV[A] + dF[A]*F =^ V*dV[A] = -dV[A]*V, 

which says that riv[A] is anti-symmetric. Similarly, 

= d/„xn(A) = U*dU[A] + dU[A]*U =^ U*dU[A] = -dU[A]*U, 

and thus the upper nx n block of r2jy[A] is also anti-symmetric. This fact and the skew-symmetry 
of riv[A] yield the relations 



{U*AV),, 
-{U*AV)ji 



l,...,n, 



0, 

0, i=l,...,n, 



1, . . . ,n, 
1, . . . ,n. 



and 

In particular. 



dai[A] = {U* AV)u, i = l,...,n. 
5|) and ()4.6p can be summarized in the system of linear equations 



(4.5) 
(4.6) 

(4.7) 









[A]- 




" {U*AV),j ' 






fl.V,ij 


[A]. 




_-{U*AV)j,_ 
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Since X is simple, the coefficient matrix is invertible, and thus 





[A]l 










'{U*AV)ii 




[A]. 













iy^j, ^,i = l, (4.8) 



These relations determine both dS[A], riv[A] and the upper n x n block of r2jy[A] completely. 
The lower (m — n) x n block of rijy[A] is determined from (j4.4p . as 

{U*AV)ij = njy^..[A]aj, i = n + l,...,m, j = l,...,n. (4.9) 

Since X has full-rank, this completely determines Q^[A]. We conclude that (i) d5][A], fl^fA] and 
fif7[A] are uniquely characterized by the SVD of X, and (ii) they act linearly on A. Furthermore, 
the differentials obey (j4.3|) . We now prove that d?7[A] is independent of the choice of U. Decompose 
rijy[A] and IJ as 

and U =[U Q], 

where ri;7[A] denotes the upper n x n block of r2jj[A] (which depends only on U by (j4.8p l and 
riQ[A] denotes the lower {m — n) x n block of fijj[A] (which by (|4.9p equals Q* AVE^^). Then 

dU[A] = Ufl(j[A] = Uftui^] + Q^q[A] = UflulA] + {QQ*)AV'E-\ 

which is well defined since the orthogonal projector QQ* is independent of the choice of Q. ■ 

Since Lemma 14.21 provides a closed- form expression for the differentials, computing the value of 
the divergence is now a matter of calculus. 

Theorem 4.3. Let f be a matrix-valued spectral function and suppose X £ ]^™x" ig simple and 
has full-rank. Then 

min(m,n) , fU \\ min(m,n) 

div(/(X))= (flia.) + \m-n\l^)+2 (4.10) 

1=1 ^ ' 2^J,JJ = 1 * J 

Proof Assume m > n without loss of generality. Let {ui}^^ and {vi}^^^ denote the columns of 
U and V, and set 

A'^=uiv*, {i,j)el. 

Then { A^-^j^j j)gj is an orthonormal basis for M"*x" and, furthermore, U*A^^V = E'^K Since 
{A*-'}(j j)gx is an orthonormal basis, 

div(/(X))= (A^^d/[A'J]>. (4.11) 

(i,i)ex 

Note that yields 

d/[A*^'] = J7*(nj^[A^^l/(I]) + J„xnd(/o S)[A^Jl + /^xn/(5])fiv[A^^DF*. 
Next, from ()4.7p we obtain 

dcrfc[A*^] = 5fci(5fcj, {i,j)el, k = l,...,n. 



n^[A] 



fig [A] 
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By the same arguments, from (j4.8p we obtain 





1 



















and from ()4.9p . 



^ik^jl 



el, k,l = 1,. . . ,n, k^l, 



k = n + l,...,m, l = l,...,n. 



Now follow (j4.1ip and decompose the divergence as 
div(/(X)) = Yl (^■'^"t>[A*^]/(S) + /™x„d(/oE)[A*J]+I^xn/(S)nv[A^^]> 

= S^ + S'^ + S^. 



We see that is equal to 



2 2 ' 



whereas 



ij=l jj=l 



i=l 



Finally, decomposes as 



i,j=l i=n+l j=l 



Using (|4.8p . the first term is equal to 



fji^j) 

while it follows from ()4.9p that the second equals 



«J=1 



o'ifijo'i) 

2 2 ' 

..... .crf-cr^ 



m—n n 



E E(^^''"f/[^^']/(^)> = EE<^^''^'''^"'/(^)) 

i=n+l J=l 4=1 i=l 



[m — n 



i=l 



Since n = min (m,n), we conclude that 

min(m,n) 

div{f{X)) = S^ + S^ + S^= E + 

1=1 



m — n 



min(m,n) 



+ 2 E 
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4.2 The complex case 

Since the singular values are not analytic functions of the entries, we can only consider the deriva- 
tives of / as a function of the real and imaginary parts of the entries of X. Therefore, we identify 
^mxn ^j^j^ j^mxn ^ j^mxn j with a complex- Valued function of 2mn variables. This has the 
particular consequence that the differential df at X £ ([^^rixn — whenever it exists — obeys 

d/x[A] = d/x[Re(A)] +d/x[^Im(A)], 

for all A G (j^mxn. j^gj-g^ ^ ^j^g imaginary unit. The differential seen as a function with domain 
j^mxn. j^mxn. -g q£ gQ^j-gg linear. Lemma 14.21 mav be adapted to the complex setting with minor 
modifications. 

Lemma 4.4. Let X be simple and full-rank. Then the differentials of U and V are given by 

dU[A] = Uft^[A], dV[A] = Vnv[A]*, 

where CIq and fiv o'^e given in closed-form (j4.15p (dU is independent of the choice ofU). Further, 
dS is given by (j4.13p . For completeness, fljj,^Y ■ C™^" i— )■ 



Proof Following the same steps as in the proof of Lemma l4.2t let X G C^' 
that ()4.4p becomes 

WAV = fij>[A]S + /„xndI][A] + I^xnSnv[A], 



{m > n). We see 
(4.12) 



where and the upper n x n block of ft^ are now skew-Hermitian. Since these matrices have 
purely imaginary entries on the diagonal, 



da^[A]=Re{{U*AV)^^} 



{U*AV)^^ + {U*AV)^ 



l,...,n. 



(4.13) 



and 



n^JAlai + nv,ii[^]ai = lm{{U*AV)ii}i 



{U*AV)ii - {U*AV) 



l,...,n. (4.14) 



Furthermore, it is not hard to see that the system (j4.8p becomes 





[A]l 










'{U*AV)ii 


^V,ij 






-CTi 






[{U*AV)ji\ 



i,j = l,...,n, 



n+l,...,m, j = 1, 



, n. 



(4.15) 



(4.16) 



and the expression equivalent to (|4.9p is 

{U^AV),,=n^^^^[A]a,, 

As in the real case, this shows dS[A], riv[A] and ri(7[A] act linearly on A, and that they are 
umquelycharacterizeci by the SVD of X. Furthermore, the same argument allows us to show that 
the differential of U is independent of the choice of U, concluding the proof. ■ 



^The attentive reader will notice that (|4.14p leaves us with one degree of freedom, and thus does not determine 
exactly the differentials. This is not a problem, and it is a consequence of the freedom of choice of a phase factor for 
the singular vectors. 
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As in the real case, we restrict attention to a simple and full-rank X G £;mxn_ need 
to determine two terms for the divergence as discussed in Section 13.21 Now if {A*''}(j is an 
orthonormal basis for M*"^", then {zA^-'lj-j j^gj is an orthonormal basis for iM™^". With the 
divergence as in (]3.6p . 

div(/(X)) = Re((A*^d/[A*^]»+ Im((A*^d/[^A^^■]» 

[ij)ex ii,j)ex 

= Rel (A^^d/[A*^]-.d/[^A*^]> 
The analogue of Theorem 14.31 is this: 

Theorem 4.5. Let f be a matrix-valued spectral function and suppose X G c^x™ simple and 
has full-rank. Then 



min(r7i,n) , f f \\ min(m,n) , . 

div(/(X))= Y (/>.0 + (2|m-n| + l)^)+4 Y "^"^ 



(4.17) 



Proof Since X is simple and has full-rank, Lemma 14.41 holds, and 



df[E'^] - zdf[E^^] = U [{n^[E^^] - in^[zE^^])f{^) + I^^^f{'E)inv[E'^ - znviiE^'])] V* 

+ UImxndf^[di:[E'^] - zdi:[iE'^]]V*. (4.18) 

Put = tj*E'W for {i,j) el for short. From (gTS]), we get 

dai[E'^] - idai[iE'^] = L]^,^ 
for G I and k = 1, . . . ,n. By the same arguments, ()4.15p gives 





-tn^jzE'^]- 


1 














-cjfc -ai 








for (i, j) £ I, k,l = 1, . . . ,n, and k ^ I. Next, ()4.16p gives 



TV 



for (i, j) G X, /c = n -|- 1, . . . , m, and / = 1, . . . , n. Finally, (|4.14|) implies 



for A: = 1, . . . , n. Now, set 



5- = J] (r^(n^[£;^'^]-^n^[^£;^^])/(S)), 
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Then it is clear that 



k^l,k,l=l '^1' k=l 

and 

n n 
{i,j)£Xk=l k=l 

Finally, 

^^ = - E E ^%^^+E/^(-^) E Wki^u,kki^''^-^^u,kki^^''^) 

{i,j)elkj^l,k,l=l ^ fc=l (ij)GX 



(jj)6Xfc=n+l «=1 



k^l,k,l=l ' fc=l fc=l («,j)GX 

Since 

we conclude 

div(/(X)) = Re(^S^ + S^ + S^ 

min(?ri.,n) . ^ / \ \ min(m,n) 



E + (2|m-n| + l)^)+ 4 ^ ^ 

where we used n = min(m, n). 



4.3 An extension to all matrix arguments 

Theorems 14.31 and 14.51 provide the divergence of a matrix-valued spectral function in closed-form 
for simple, full-rank arguments. Here, we indicate that these formulas have a continuous extension 
to all matrices. Before continuing, we introduce some notation. For a given X, we denote as k < n 
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the number of distinct singular values while Si and dj, i = 1, . . . , k, denote the i-th distinct singular 
value and its multiplicity. 

As previously discussed, when I] is simple there is no ambiguity as to which /j is being applied 
to a singular value. However, when 'S is not simple, the ambiguities make / nondifferentiable unless 
/i = / for i = l,...,n. 

Theorem 4.6. Let f be a matrix-valued spectral function, with fi = f for i = l,...,n, and 
/(O) = 0. Then the divergence extends by continuity to any X € M™^", and is given by 



div(/(X)) = Yl 

i: Si> 



di + 



i: Si=0 



(|m - n| + l)di + 2 



/'(0) + 2 Yl did. 



Sif{Si) 



^ s"^ -s^' 



(4.19) 



and to any X £ C^' 
div(/(X)) 



E 

i: Si>0 

E 

i: Si=0 



di + 2{ ']] f'is,) + (2|m - n| + l)d, + 2 



2 



2{\m-n\ + l)di +4 



/'(0)+4 ^ did, 



Sif{Si 



^ s'^ -s"^' 



(4.20) 



Proof We only prove (j4.19p as the same arguments apply to ()4.20p . The key idea involves non- 
tangential limits defined below. Suppose we have a sequence {Xk} of full-rank matrices with 
X^ — y X^ as k —7- oo; we establish that the limit div(/(Xfc)) is independent of the sequence. 
Since ()4.10p depends only on the singular values of X^, we can focus on the sequence of singular 
values. Let {crj}"^^ be the singular values of X, and let {a^}f^i be those of X^. We have 



min(m,n) , 

div(/(Xfe))= Y + 
1=1 ^ 

The first terms are easy to analyze. In fact. 



f(n^)\ ™"("^'") ^kf(^k 



k\2 ' 



lim fiat) = f'{ai) and lim M 



if fjj > 0, 
/'(O), ifa, = 0. 



The remaining term can be decomposed as 



min(m,n) , , 

</(af 



min(m,,?i) min(m..n) 

E + E 



(af )2 _ (^fc)2 



:=/o + /l. 



(4.21) 



We now formalize the notion of non-tangential limit. Consider a pair {i,j) with i ^ j and 

= — o", 5j = (Tj — fj and p^j- = 5^ /5i . 



ai = aj := a, and define 
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We say that approaches X non-tangentially if there exists a constant > such that Ip^^- — 1| > 
Cij. Roughly speaking, the matrices Xj. approach X fast enough, so that they remain simple, and 
have full-rank. We have 



(cjf )2 - (a^fc)2 2a + 5f + 6^ \ 6^ - 5[ 



k I 

i / 

k 



We distinguish two cases. If a > 0, we see that 



whereas when a = we obtain 

lim ' ' ^ ^ = f'(0) 

fc^oo {afy - (cj|)2 

and, therefore. 

The second sum Ii in (j4.2ip is easier to analyze, as it can be seen that 
concluding the proof. 



5 Discussion 

Beyond MRI, SVT denoising also has potential application for medical imaging modalities outside of 
MRI. For example, in perfusion X-ray computed tomography (CT) studies, where some anatomical 
region (e.g., the kidneys) is repeatedly scanned to visualize hemodynamics. X-ray tube energy 
is routinely lowered to limit the ionizing radiation dose delivered to the patient. However, this 
can result in a substantial increase of image noise. Recent works (e.g., [1]) have shown that 
retrospective denoising of low-dose CT data can often yield diagnostic information comparable to 
full-dose images. It may be possible to adapt SVT, and the proposed risk estimators, for the CT 
dose reduction problem as well. 

Following the successful application of compressive sensing theory [71 [9] for accelerated MRI 
(e.g., [201 135j). the use of low-rank constraints for undersampled MRI reconstruction is also of 
increasing interest. For the dynamic and parametric acquisition scenarios described above, the 
generalization from denoising to undersampled reconstruction is straightforward in light of recent 
developments on the topic of matrix completion |6l[29]. To date, low-rank reconstruction methods 
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have been deployed for accelerated cardiac imaging [131 13 ES] and dynamic contrast-enhanced 
MRI (DCE-MRI) of the breast [13] . More recently, it has been demonstrated that reconstructing 
accelerated parallel MRI data under low-rank constraints |2HI37] can obviate the need for expensive 
and potentially error-prone patient-specific calibration procedures as well as facilitate previously 
impractical acquisition strategies. As with the denoising application, parameter selection is an 
outstanding challenge in undersampled reconstruction problems. Unlike the denoising problem, 
only incomplete noisy data is available and Stein's method is not directly applicable. Recently, for 
undersampled MRI reconstruction, Ramani et al. [28] demonstrated that Stein's method can be used 
to generate an unbiased risk estimator for the predicted MSE of ^i-minimization reconstructions, 
and that this can be used for automated parameter optimization. An interesting future direction 
of research will lie in determining if the models developed in this work facilitate development of a 
similar framework for the low-rank matrix recovery problem. 
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